gpu offload host code generation #142097

ZuseZ4 · 2025-06-05T20:01:04Z

r? ghost

This will generate most of the host side code to use llvm's offload feature.
The first PR will only handle automatic mem-transfers to and from the device.
So if a user calls a kernel, we will copy inputs back and forth, but we won't do the actual kernel launch.
Before merging, we will use LLVM's Info infrastructure to verify that the memcopies match what openmp offloa generates in C++. LIBOMPTARGET_INFO=-1 ./my_rust_binary should print that a memcpy to and later from the device is happening.

A follow-up PR will generate the actual device-side kernel which will then do computations on the GPU.
A third PR will implement manual host2device and device2host functionality, but the goal is to minimize cases where a user has to overwrite our default handling due to performance issues.

I'm trying to get a full MVP out first, so this just recognizes GPU functions based on magic names. The final frontend will obviously move this over to use proper macros, like I'm already doing it for the autodiff work.
This work will also be compatible with std::autodiff, so one can differentiate GPU kernels.

Tracking:

Tracking Issue for GPU-offload #131513

ZuseZ4 · 2025-06-10T03:09:23Z

@oli-obk Featurewise, I am almost done. I'll add a few more lines to describe the layout of Rust types to the offload library, but in this PR I only intend to support one type or two (maybe array's, raw pointer, or slices). I might even hardcode the length in the very first approach. In a follow-up PR I'll do some proper type parsing on a higher level, similar to what I did in the past with Rust TypeTrees. This work is much simpler and more reliable though, since offload doesn't care what type something has, just how many bytes it is large, and hence need to be moved to/from the GPU.

I was able to just move a few of the builder methods I needed to the generic builder.
However, there are also around 7 that I had to duplicate. I guess at some point I'll need to do the proper work of enabling the trait implementations for both builders :/
Once I have everything working, I'll clean it up and add some tests and docs.

compiler/rustc_codegen_llvm/src/back/lto.rs

compiler/rustc_codegen_llvm/src/context.rs

ZuseZ4 · 2025-06-12T14:32:20Z

Not fully ready yet, I apparently missed yet another global to initialize the offload runtime. But at least it compiles successfully to a binary if I emit the IR from Rust, and then use clang for the rest. I'll add the global today, then I should be done and will clean it up

ZuseZ4 · 2025-07-18T23:41:30Z

Sorry for all the pings, the ways in which I break git at times break never cease to amaze me.

I'll leave the helpers for now, till I have the full code online. If they're still not helpful by then I guess I'll drop them.

@bors r+

bors · 2025-07-18T23:41:33Z

📌 Commit c068599 has been approved by ZuseZ4

It is now in the queue for this repository.

ZuseZ4 · 2025-07-18T23:42:54Z

@bors r-

(Let me actually add oli as the correct reviewer)

ZuseZ4 · 2025-07-18T23:44:37Z

@bors r=oli-obk

bors · 2025-07-18T23:44:40Z

📌 Commit c068599 has been approved by oli-obk

It is now in the queue for this repository.

gpu offload host code generation r? ghost This will generate most of the host side code to use llvm's offload feature. The first PR will only handle automatic mem-transfers to and from the device. So if a user calls a kernel, we will copy inputs back and forth, but we won't do the actual kernel launch. Before merging, we will use LLVM's Info infrastructure to verify that the memcopies match what openmp offloa generates in C++. `LIBOMPTARGET_INFO=-1 ./my_rust_binary` should print that a memcpy to and later from the device is happening. A follow-up PR will generate the actual device-side kernel which will then do computations on the GPU. A third PR will implement manual host2device and device2host functionality, but the goal is to minimize cases where a user has to overwrite our default handling due to performance issues. I'm trying to get a full MVP out first, so this just recognizes GPU functions based on magic names. The final frontend will obviously move this over to use proper macros, like I'm already doing it for the autodiff work. This work will also be compatible with std::autodiff, so one can differentiate GPU kernels. Tracking: - rust-lang#131513

Rollup of 7 pull requests Successful merges: - #142097 (gpu offload host code generation) - #143906 (Miri: non-deterministic floating point operations in `foreign_items`) - #144144 (tests: Skip supported-crate-types test on musl hosts) - #144159 (opt-dist: change build_dir field to be an actual build dir) - #144162 (Debug impls for DropElaborators) - #144189 (Add non-regression test for #144168) - #144216 (Don't consider unstable fields always-inhabited) r? `@ghost` `@rustbot` modify labels: rollup

gpu offload host code generation r? ghost This will generate most of the host side code to use llvm's offload feature. The first PR will only handle automatic mem-transfers to and from the device. So if a user calls a kernel, we will copy inputs back and forth, but we won't do the actual kernel launch. Before merging, we will use LLVM's Info infrastructure to verify that the memcopies match what openmp offloa generates in C++. `LIBOMPTARGET_INFO=-1 ./my_rust_binary` should print that a memcpy to and later from the device is happening. A follow-up PR will generate the actual device-side kernel which will then do computations on the GPU. A third PR will implement manual host2device and device2host functionality, but the goal is to minimize cases where a user has to overwrite our default handling due to performance issues. I'm trying to get a full MVP out first, so this just recognizes GPU functions based on magic names. The final frontend will obviously move this over to use proper macros, like I'm already doing it for the autodiff work. This work will also be compatible with std::autodiff, so one can differentiate GPU kernels. Tracking: - rust-lang#131513

Rollup of 6 pull requests Successful merges: - #142097 (gpu offload host code generation) - #143906 (Miri: non-deterministic floating point operations in `foreign_items`) - #144144 (tests: Skip supported-crate-types test on musl hosts) - #144162 (Debug impls for DropElaborators) - #144189 (Add non-regression test for #144168) - #144216 (Don't consider unstable fields always-inhabited) r? `@ghost` `@rustbot` modify labels: rollup

gpu offload host code generation r? ghost This will generate most of the host side code to use llvm's offload feature. The first PR will only handle automatic mem-transfers to and from the device. So if a user calls a kernel, we will copy inputs back and forth, but we won't do the actual kernel launch. Before merging, we will use LLVM's Info infrastructure to verify that the memcopies match what openmp offloa generates in C++. `LIBOMPTARGET_INFO=-1 ./my_rust_binary` should print that a memcpy to and later from the device is happening. A follow-up PR will generate the actual device-side kernel which will then do computations on the GPU. A third PR will implement manual host2device and device2host functionality, but the goal is to minimize cases where a user has to overwrite our default handling due to performance issues. I'm trying to get a full MVP out first, so this just recognizes GPU functions based on magic names. The final frontend will obviously move this over to use proper macros, like I'm already doing it for the autodiff work. This work will also be compatible with std::autodiff, so one can differentiate GPU kernels. Tracking: - rust-lang#131513

rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2025

ZuseZ4 added F-gpu_offload `#![feature(gpu_offload)]` and removed A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2025

This comment has been minimized.

Sign in to view

rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2025

This comment has been minimized.

Sign in to view

rustbot added the F-autodiff `#![feature(autodiff)]` label Jun 9, 2025

This comment has been minimized.

Sign in to view

ZuseZ4 mentioned this pull request Mar 4, 2025

Tracking Issue for GPU-offload #131513

Open

5 tasks

oli-obk reviewed Jun 12, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/back/lto.rs Show resolved Hide resolved

oli-obk reviewed Jun 12, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/context.rs Outdated Show resolved Hide resolved

oli-obk reviewed Jun 12, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/context.rs Outdated Show resolved Hide resolved

ZuseZ4 mentioned this pull request Jun 15, 2025

Expose experimental LLVM features for GPU offloading rust-lang/rust-project-goals#109

Open

4 tasks

ZuseZ4 force-pushed the offload-host1 branch from c8d7349 to 1c1953d Compare June 17, 2025 04:02

rustbot added the T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) label Jun 17, 2025

This comment has been minimized.

Sign in to view

ZuseZ4 force-pushed the offload-host1 branch from 1c1953d to f185093 Compare June 17, 2025 21:35

ZuseZ4 removed A-CI Area: Our Github Actions CI A-run-make Area: port run-make Makefiles to rmake.rs F-autodiff `#![feature(autodiff)]` A-tidy Area: The tidy tool A-rustc-dev-guide Area: rustc-dev-guide labels Jul 18, 2025

ZuseZ4 added 3 commits July 18, 2025 16:30

gpu host code generation

4a1a5a4

add gpu offload codegen host side test

e2ab312

add unstable-books doc for offload

c068599

ZuseZ4 force-pushed the offload-host1 branch from 6bc5f56 to c068599 Compare July 18, 2025 23:31

rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. F-autodiff `#![feature(autodiff)]` T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 18, 2025

bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jul 18, 2025

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 18, 2025

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 18, 2025

GuillaumeGomez mentioned this pull request Jul 20, 2025

Rollup of 7 pull requests #144228

Closed

GuillaumeGomez mentioned this pull request Jul 20, 2025

Rollup of 6 pull requests #144231

Closed

jieyouxu mentioned this pull request Jul 21, 2025

Rollup of 10 pull requests #144245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gpu offload host code generation #142097

gpu offload host code generation #142097

ZuseZ4 commented Jun 5, 2025 •

edited

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ZuseZ4 commented Jun 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZuseZ4 commented Jun 12, 2025 •

edited

Loading

Uh oh!

This comment has been minimized.

ZuseZ4 commented Jul 18, 2025

Uh oh!

bors commented Jul 18, 2025

Uh oh!

ZuseZ4 commented Jul 18, 2025

Uh oh!

ZuseZ4 commented Jul 18, 2025

Uh oh!

bors commented Jul 18, 2025

Uh oh!

Uh oh!

gpu offload host code generation #142097

Are you sure you want to change the base?

gpu offload host code generation #142097

Conversation

ZuseZ4 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ZuseZ4 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZuseZ4 commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

ZuseZ4 commented Jul 18, 2025

Uh oh!

bors commented Jul 18, 2025

Uh oh!

ZuseZ4 commented Jul 18, 2025

Uh oh!

ZuseZ4 commented Jul 18, 2025

Uh oh!

bors commented Jul 18, 2025

Uh oh!

Uh oh!

ZuseZ4 commented Jun 5, 2025 •

edited

Loading

ZuseZ4 commented Jun 10, 2025 •

edited

Loading

ZuseZ4 commented Jun 12, 2025 •

edited

Loading